Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 184
Filtrar
1.
bioRxiv ; 2024 Mar 16.
Artigo em Inglês | MEDLINE | ID: mdl-38559242

RESUMO

Immunomodulatory imide drugs (IMiDs) including thalidomide, lenalidomide, and pomalidomide, can be used to induce degradation of a protein of interest that is fused to a short zinc finger (ZF) degron motif. These IMiDs, however, also induce degradation of endogenous neosubstrates, including IKZF1 and IKZF3. To improve degradation selectivity, we took a bump-and-hole approach to design and screen bumped IMiD analogs against 8380 ZF mutants. This yielded a bumped IMiD analog that induces efficient degradation of a mutant ZF degron, while not affecting other cellular proteins, including IKZF1 and IKZF3. In proof-of-concept studies, this system was applied to induce efficient degradation of TRIM28, a disease-relevant protein with no known small molecule binders. We anticipate that this system will make a valuable addition to the current arsenal of degron systems for use in target validation.

2.
Nat Methods ; 2024 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-38654083

RESUMO

T cells are essential immune cells responsible for identifying and eliminating pathogens. Through interactions between their T-cell antigen receptors (TCRs) and antigens presented by major histocompatibility complex molecules (MHCs) or MHC-like molecules, T cells discriminate foreign and self peptides. Determining the fundamental principles that govern these interactions has important implications in numerous medical contexts. However, reconstructing a map between T cells and their antagonist antigens remains an open challenge for the field of immunology, and success of in silico reconstructions of this relationship has remained incremental. In this Perspective, we discuss the role that new state-of-the-art deep-learning models for predicting protein structure may play in resolving some of the unanswered questions the field faces linking TCR and peptide-MHC properties to T-cell specificity. We provide a comprehensive overview of structural databases and the evolution of predictive models, and highlight the breakthrough AlphaFold provided the field.

3.
Chem Sci ; 15(9): 3130-3139, 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38425520

RESUMO

The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often produce physically implausible molecular structures. It is therefore not sufficient to evaluate these methods solely by RMSD to a native binding mode. It is vital, particularly for deep learning-based methods, that they are also evaluated on steric and energetic criteria. We present PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit. The PoseBusters test suite validates chemical and geometric consistency of a ligand including its stereochemistry, and the physical plausibility of intra- and intermolecular measurements such as the planarity of aromatic rings, standard bond lengths, and protein-ligand clashes. Only methods that both pass these checks and predict native-like binding modes should be classed as having "state-of-the-art" performance. We use PoseBusters to compare five deep learning-based docking methods (DeepDock, DiffDock, EquiBind, TankBind, and Uni-Mol) and two well-established standard docking methods (AutoDock Vina and CCDC Gold) with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. We show that both in terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools. In addition, we find that molecular mechanics force fields contain docking-relevant physics missing from deep-learning methods. PoseBusters allows practitioners to assess docking and molecular generation methods and may inspire new inductive biases still required to improve deep learning-based methods, which will help drive the development of more accurate and more realistic predictions.

4.
J Cheminform ; 16(1): 32, 2024 Mar 14.
Artigo em Inglês | MEDLINE | ID: mdl-38486231

RESUMO

Protein-ligand binding site prediction is a useful tool for understanding the functional behaviour and potential drug-target interactions of a novel protein of interest. However, most binding site prediction methods are tested by providing crystallised ligand-bound (holo) structures as input. This testing regime is insufficient to understand the performance on novel protein targets where experimental structures are not available. An alternative option is to provide computationally predicted protein structures, but this is not commonly tested. However, due to the training data used, computationally-predicted protein structures tend to be extremely accurate, and are often biased toward a holo conformation. In this study we describe and benchmark IF-SitePred, a protein-ligand binding site prediction method which is based on the labelling of ESM-IF1 protein language model embeddings combined with point cloud annotation and clustering. We show that not only is IF-SitePred competitive with state-of-the-art methods when predicting binding sites on experimental structures, but it performs better on proxies for novel proteins where low accuracy has been simulated by molecular dynamics. Finally, IF-SitePred outperforms other methods if ensembles of predicted protein structures are generated.

5.
Front Immunol ; 15: 1352703, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38482007

RESUMO

Deep learning models have been shown to accurately predict protein structure from sequence, allowing researchers to explore protein space from the structural viewpoint. In this paper we explore whether "novel" features, such as distinct loop conformations can arise from these predictions despite not being present in the training data. Here we have used ABodyBuilder2, a deep learning antibody structure predictor, to predict the structures of ~1.5M paired antibody sequences. We examined the predicted structures of the canonical CDR loops and found that most of these predictions fall into the already described CDR canonical form structural space. We also found a small number of "new" canonical clusters composed of heterogeneous sequences united by a common sequence motif and loop conformation. Analysis of these novel clusters showed their origins to be either shapes seen in the training data at very low frequency or shapes seen at high frequency but at a shorter sequence length. To evaluate explicitly the ability of ABodyBuilder2 to extrapolate, we retrained several models whilst withholding all antibody structures of a specific CDR loop length or canonical form. These "starved" models showed evidence of generalisation across CDRs of different lengths, but they did not extrapolate to loop conformations which were highly distinct from those present in the training data. However, the models were able to accurately predict a canonical form even if only a very small number of examples of that shape were in the training data. Our results suggest that deep learning protein structure prediction methods are unable to make completely out-of-domain predictions for CDR loops. However, in our analysis we also found that even minimal amounts of data of a structural shape allow the method to recover its original predictive abilities. We have made the ~1.5 M predicted structures used in this study available to download at https://doi.org/10.5281/zenodo.10280181.


Assuntos
Regiões Determinantes de Complementaridade , Aprendizado Profundo , Regiões Determinantes de Complementaridade/química , Conformação Proteica , Modelos Moleculares , Anticorpos
6.
PLoS Comput Biol ; 20(3): e1011901, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38470915

RESUMO

A novel class of protein misfolding characterized by either the formation of non-native noncovalent lasso entanglements in the misfolded structure or loss of native entanglements has been predicted to exist and found circumstantial support through biochemical assays and limited-proteolysis mass spectrometry data. Here, we examine whether it is possible to design small molecule compounds that can bind to specific folding intermediates and thereby avoid these misfolded states in computer simulations under idealized conditions (perfect drug-binding specificity, zero promiscuity, and a smooth energy landscape). Studying two proteins, type III chloramphenicol acetyltransferase (CAT-III) and D-alanyl-D-alanine ligase B (DDLB), that were previously suggested to form soluble misfolded states through a mechanism involving a failure-to-form of native entanglements, we explore two different drug design strategies using coarse-grained structure-based models. The first strategy, in which the native entanglement is stabilized by drug binding, failed to decrease misfolding because it formed an alternative entanglement at a nearby region. The second strategy, in which a small molecule was designed to bind to a non-native tertiary structure and thereby destabilize the native entanglement, succeeded in decreasing misfolding and increasing the native state population. This strategy worked because destabilizing the entanglement loop provided more time for the threading segment to position itself correctly to be wrapped by the loop to form the native entanglement. Further, we computationally identified several FDA-approved drugs with the potential to bind these intermediate states and rescue misfolding in these proteins. This study suggests it is possible for small molecule drugs to prevent protein misfolding of this type.


Assuntos
Dobramento de Proteína , Proteínas , Proteínas/química , Simulação por Computador , Software , Espectrometria de Massas
7.
Proc Natl Acad Sci U S A ; 121(7): e2311049121, 2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38319973

RESUMO

Intrathecal synthesis of central nervous system (CNS)-reactive autoantibodies is observed across patients with autoimmune encephalitis (AE), who show multiple residual neurobehavioral deficits and relapses despite immunotherapies. We leveraged two common forms of AE, mediated by leucine-rich glioma inactivated-1 (LGI1) and contactin-associated protein-like 2 (CASPR2) antibodies, as human models to comprehensively reconstruct and profile cerebrospinal fluid (CSF) B cell receptor (BCR) characteristics. We hypothesized that the resultant observations would both inform the observed therapeutic gap and determine the contribution of intrathecal maturation to pathogenic B cell lineages. From the CSF of three patients, 381 cognate-paired IgG BCRs were isolated by cell sorting and scRNA-seq, and 166 expressed as monoclonal antibodies (mAbs). Sixty-two percent of mAbs from singleton BCRs reacted with either LGI1 or CASPR2 and, strikingly, this rose to 100% of cells in clonal groups with ≥4 members. These autoantigen-reactivities were more concentrated within antibody-secreting cells (ASCs) versus B cells (P < 0.0001), and both these cell types were more differentiated than LGI1- and CASPR2-unreactive counterparts. Despite greater differentiation, autoantigen-reactive cells had acquired few mutations intrathecally and showed minimal variation in autoantigen affinities within clonal expansions. Also, limited CSF T cell receptor clonality was observed. In contrast, a comparison of germline-encoded BCRs versus the founder intrathecal clone revealed marked gains in both affinity and mutational distances (P = 0.004 and P < 0.0001, respectively). Taken together, in patients with LGI1 and CASPR2 antibody encephalitis, our results identify CSF as a compartment with a remarkably high frequency of clonally expanded autoantigen-reactive ASCs whose BCR maturity appears dominantly acquired outside the CNS.


Assuntos
Doenças Autoimunes do Sistema Nervoso , Encefalite , Glioma , Doença de Hashimoto , Humanos , Leucina , Peptídeos e Proteínas de Sinalização Intracelular , Recidiva Local de Neoplasia , Autoanticorpos , Autoantígenos
8.
Commun Biol ; 7(1): 62, 2024 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-38191620

RESUMO

Antibodies with lambda light chains (λ-antibodies) are generally considered to be less developable than those with kappa light chains (κ-antibodies). Though this hypothesis has not been formally established, it has led to substantial systematic biases in drug discovery pipelines and thus contributed to kappa dominance amongst clinical-stage therapeutics. However, the identification of increasing numbers of epitopes preferentially engaged by λ-antibodies shows there is a functional cost to neglecting to consider them as potential lead candidates. Here, we update our Therapeutic Antibody Profiler (TAP) tool to use the latest data and machine learning-based structure prediction, and apply it to evaluate developability risk profiles for κ-antibodies and λ-antibodies based on their surface physicochemical properties. We find that while human λ-antibodies on average have a higher risk of developability issues than κ-antibodies, a sizeable proportion are assigned lower-risk profiles by TAP and should represent more tractable candidates for therapeutic development. Through a comparative analysis of the low- and high-risk populations, we highlight opportunities for strategic design that TAP suggests would enrich for more developable λ-antibodies. Overall, we provide context to the differing developability of κ- and λ-antibodies, enabling a rational approach to incorporate more diversity into the initial pool of immunotherapeutic candidates.


Assuntos
Anticorpos , Descoberta de Drogas , Humanos , Anticorpos/uso terapêutico , Epitopos , Aprendizado de Máquina , Propriedades de Superfície
9.
Nucleic Acids Res ; 52(D1): D545-D551, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971316

RESUMO

Antibodies are key proteins of the adaptive immune system, and there exists a large body of academic literature and patents dedicated to their study and concomitant conversion into therapeutics, diagnostics, or reagents. These documents often contain extensive functional characterisations of the sets of antibodies they describe. However, leveraging these heterogeneous reports, for example to offer insights into the properties of query antibodies of interest, is currently challenging as there is no central repository through which this wide corpus can be mined by sequence or structure. Here, we present PLAbDab (the Patent and Literature Antibody Database), a self-updating repository containing over 150,000 paired antibody sequences and 3D structural models, of which over 65 000 are unique. We describe the methods used to extract, filter, pair, and model the antibodies in PLAbDab, and showcase how PLAbDab can be searched by sequence, structure, or keyword. PLAbDab uses include annotating query antibodies with potential antigen information from similar entries, analysing structural models of existing antibodies to identify modifications that could improve their properties, and facilitating the compilation of bespoke datasets of antibody sequences/structures that bind to a specific antigen. PLAbDab is freely available via Github (https://github.com/oxpig/PLAbDab) and as a searchable webserver (https://opig.stats.ox.ac.uk/webapps/plabdab/).


Assuntos
Anticorpos , Bases de Dados Factuais , Anticorpos/química , Anticorpos/genética , Antígenos/metabolismo , Modelos Moleculares , Patentes como Assunto , Internet
10.
Nat Biotechnol ; 42(2): 185-186, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37845572
12.
J Chem Inf Model ; 63(22): 6964-6971, 2023 Nov 27.
Artigo em Inglês | MEDLINE | ID: mdl-37934909

RESUMO

The electrostatic properties of proteins arise from the number and distribution of polar and charged residues. Electrostatic interactions in proteins play a critical role in numerous processes such as molecular recognition, protein solubility, viscosity, and antibody developability. Thus, characterizing and quantifying electrostatic properties of a protein are prerequisites for understanding these processes. Here, we present PEP-Patch, a tool to visualize and quantify the electrostatic potential on the protein surface in terms of surface patches, denoting separated areas of the surface with a common physical property. We highlight its applicability to elucidate protease substrate specificity and antibody-antigen recognition and predict heparin column retention times of antibodies as an indicator of pharmacokinetics.


Assuntos
Anticorpos , Proteínas , Eletricidade Estática , Proteínas/química , Solubilidade , Viscosidade
13.
Front Mol Biosci ; 10: 1237621, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37790877

RESUMO

The function of an antibody is intrinsically linked to the epitope it engages. Clonal clustering methods, based on sequence identity, are commonly used to group antibodies that will bind to the same epitope. However, such methods neglect the fact that antibodies with highly diverse sequences can exhibit similar binding site geometries and engage common epitopes. In a previous study, we described SPACE1, a method that structurally clustered antibodies in order to predict their epitopes. This methodology was limited by the inaccuracies and incomplete coverage of template-based modeling. In addition, it was only benchmarked at the level of domain-consistency on one virus class. Here, we present SPACE2, which uses the latest machine learning-based structure prediction technology combined with a novel clustering protocol, and benchmark it on binding data that have epitope-level resolution. On six diverse sets of antigen-specific antibodies, we demonstrate that SPACE2 accurately clusters antibodies that engage common epitopes and achieves far higher dataset coverage than clonal clustering and SPACE1. Furthermore, we show that the functionally consistent structural clusters identified by SPACE2 are even more diverse in sequence, genetic lineage, and species origin than those found by SPACE1. These results reiterate that structural data improve our ability to identify antibodies that bind to the same epitope, adding information to sequence-based methods, especially in datasets of antibodies from diverse sources. SPACE2 is openly available on GitHub (https://github.com/oxpig/SPACE2).

14.
J Cheminform ; 15(1): 84, 2023 Sep 19.
Artigo em Inglês | MEDLINE | ID: mdl-37726844

RESUMO

Many recently proposed structure-based virtual screening models appear to be able to accurately distinguish high affinity binders from non-binders. However, several recent studies have shown that they often do so by exploiting ligand-specific biases in the dataset, rather than identifying favourable intermolecular interactions in the input protein-ligand complex. In this work we propose a novel approach for assessing the extent to which machine learning-based virtual screening models are able to identify the functional groups responsible for binding. To sidestep the difficulty in establishing the ground truth importance of each atom of a large scale set of protein-ligand complexes, we propose a protocol for generating synthetic data. Each ligand in the dataset is surrounded by a randomly sampled point cloud of pharmacophores, and the label assigned to the synthetic protein-ligand complex is determined by a 3-dimensional deterministic binding rule. This allows us to precisely quantify the ground truth importance of each atom and compare it to the model generated attributions. Using our generated datasets, we demonstrate that a recently proposed deep learning-based virtual screening model, PointVS, identified the most important functional groups with 39% more efficiency than a fingerprint-based random forest, suggesting that it would generalise more effectively to new examples. In addition, we found that ligand-specific biases, such as those present in widely used virtual screening datasets, substantially impaired the ability of all ML models to identify the most important functional groups. We have made our synthetic data generation framework available to facilitate the benchmarking of new virtual screening models. Code is available at https://github.com/tomhadfield95/synthVS .

15.
Nat Commun ; 14(1): 5763, 2023 09 16.
Artigo em Inglês | MEDLINE | ID: mdl-37717048

RESUMO

CC and CXC-chemokines are the primary drivers of chemotaxis in inflammation, but chemokine network redundancy thwarts pharmacological intervention. Tick evasins promiscuously bind CC and CXC-chemokines, overcoming redundancy. Here we show that short peptides that promiscuously bind both chemokine classes can be identified from evasins by phage-display screening performed with multiple chemokines in parallel. We identify two conserved motifs within these peptides and show using saturation-mutagenesis phage-display and chemotaxis studies of an exemplar peptide that an anionic patch in the first motif and hydrophobic, aromatic and cysteine residues in the second are functionally necessary. AlphaFold2-Multimer modelling suggests that the peptide occludes distinct receptor-binding regions in CC and in CXC-chemokines, with the first and second motifs contributing ionic and hydrophobic interactions respectively. Our results indicate that peptides with broad-spectrum anti-chemokine activity and therapeutic potential may be identified from evasins, and the pharmacophore characterised by phage display, saturation mutagenesis and computational modelling.


Assuntos
Bacteriófagos , Quimiocinas , Fenômenos Químicos , Simulação por Computador , Mutagênese
16.
Front Immunol ; 14: 1223802, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37564639

RESUMO

Antibodies, through their ability to target virtually any epitope, play a key role in driving the adaptive immune response in jawed vertebrates. The binding domains of standard antibodies are their variable light (VL) and heavy (VH) domains, both of which present analogous complementarity-determining region (CDR) loops. It has long been known that the VH CDRs contribute more heavily to the antigen-binding surface (paratope), with the CDR-H3 loop providing a major modality for the generation of diverse paratopes. Here, we provide evidence for an additional role of the VL domain as a modulator of CDR-H3 structure, using a diverse set of antibody crystal structures and a large set of molecular dynamics simulations. We show that specific attributes of the VL domain such as subtypes, CDR canonical forms and genes can influence the structural diversity of the CDR-H3 loop, and provide a physical model for how this effect occurs through inter-loop contacts and packing of CDRs against each other. Our results indicate that the rigid minor loops fine-tune the structure of CDR-H3, thereby contributing to the generation of surfaces complementary to the vast number of possible epitope topologies, and provide insights into the interdependent nature of CDR conformations, an understanding of which is important for the rational antibody design process.

17.
J Proteome Res ; 22(9): 2959-2972, 2023 09 01.
Artigo em Inglês | MEDLINE | ID: mdl-37582225

RESUMO

Proteins often undergo structural perturbations upon binding to other proteins or ligands or when they are subjected to environmental changes. Hydrogen-deuterium exchange mass spectrometry (HDX-MS) can be used to explore conformational changes in proteins by examining differences in the rate of deuterium incorporation in different contexts. To determine deuterium incorporation rates, HDX-MS measurements are typically made over a time course. Recently introduced methods show that incorporating the temporal dimension into the statistical analysis improves power and interpretation. However, these approaches have technical assumptions that hinder their flexibility. Here, we propose a more flexible methodology by reframing these methods in a Bayesian framework. Our proposed framework has improved algorithmic stability, allows us to perform uncertainty quantification, and can calculate statistical quantities that are inaccessible to other approaches. We demonstrate the general applicability of the method by showing it can perform rigorous model selection on a spike-in HDX-MS experiment, improved interpretation in an epitope mapping experiment, and increased sensitivity in a small molecule case-study. Bayesian analysis of an HDX experiment with an antibody dimer bound to an E3 ubiquitin ligase identifies at least two interaction interfaces where previous methods obtained confounding results due to the complexities of conformational changes on binding. Our findings are consistent with the cocrystal structure of these proteins, demonstrating a bayesian approach can identify important binding epitopes from HDX data. We also generate HDX-MS data of the bromodomain-containing protein BRD4 in complex with GSK1210151A to demonstrate the increased sensitivity of adopting a Bayesian approach.


Assuntos
Medição da Troca de Deutério , Espectrometria de Massa com Troca Hidrogênio-Deutério , Teorema de Bayes , Deutério/química , Medição da Troca de Deutério/métodos , Proteínas Nucleares , Espectrometria de Massas/métodos , Fatores de Transcrição
18.
Front Immunol ; 14: 1231623, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37533864

RESUMO

Antibodies are the largest class of biotherapeutics. However, in recent years, single-domain antibodies have gained traction due to their smaller size and comparable binding affinity. Antibodies (Abs) and single-domain antibodies (sdAbs) differ in the structures of their binding sites: most significantly, single-domain antibodies lack a light chain and so have just three CDR loops. Given this inherent structural difference, it is important to understand whether Abs and sdAbs are distinguishable in how they engage a binding partner and thus, whether they are suited to different types of epitopes. In this study, we use non-redundant sequence and structural datasets to compare the paratopes, epitopes and antigen interactions of Abs and sdAbs. We demonstrate that even though sdAbs have smaller paratopes, they target epitopes of equal size to those targeted by Abs. To achieve this, the paratopes of sdAbs contribute more interactions per residue than the paratopes of Abs. Additionally, we find that conserved framework residues are of increased importance in the paratopes of sdAbs, suggesting that they include non-specific interactions to achieve comparable affinity. Furthermore, the epitopes of sdAbs are only marginally less accessible than those of Abs: we posit that this may be explained by differences in the orientation and compaction of sdAb and Ab CDR-H3 loops. Overall, our results have important implications for the engineering and humanization of sdAbs, as well as the selection of the best modality for targeting a particular epitope.


Assuntos
Anticorpos de Domínio Único , Anticorpos , Sítios de Ligação , Epitopos , Antígenos
19.
Sci Rep ; 13(1): 11612, 2023 07 18.
Artigo em Inglês | MEDLINE | ID: mdl-37463925

RESUMO

Antibodies with similar amino acid sequences, especially across their complementarity-determining regions, often share properties. Finding that an antibody of interest has a similar sequence to naturally expressed antibodies in healthy or diseased repertoires is a powerful approach for the prediction of antibody properties, such as immunogenicity or antigen specificity. However, as the number of available antibody sequences is now in the billions and continuing to grow, repertoire mining for similar sequences has become increasingly computationally expensive. Existing approaches are limited by either being low-throughput, non-exhaustive, not antibody specific, or only searching against entire chain sequences. Therefore, there is a need for a specialized tool, optimized for a rapid and exhaustive search of any antibody region against all known antibodies, to better utilize the full breadth of available repertoire sequences. We introduce Known Antibody Search (KA-Search), a tool that allows for the rapid search of billions of antibody variable domains by amino acid sequence identity across either the variable domain, the complementarity-determining regions, or a user defined antibody region. We show KA-Search in operation on the [Formula: see text]2.4 billion antibody sequences available in the OAS database. KA-Search can be used to find the most similar sequences from OAS within 30 minutes and a representative subset of 10 million sequences in less than 9 seconds. We give examples of how KA-Search can be used to obtain new insights about an antibody of interest. KA-Search is freely available at https://github.com/oxpig/kasearch .


Assuntos
Anticorpos , Regiões Determinantes de Complementaridade , Regiões Determinantes de Complementaridade/química , Sequência de Aminoácidos
20.
J Chem Inf Model ; 63(10): 2960-2974, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37166179

RESUMO

Over the past few years, many machine learning-based scoring functions for predicting the binding of small molecules to proteins have been developed. Their objective is to approximate the distribution which takes two molecules as input and outputs the energy of their interaction. Only a scoring function that accounts for the interatomic interactions involved in binding can accurately predict binding affinity on unseen molecules. However, many scoring functions make predictions based on data set biases rather than an understanding of the physics of binding. These scoring functions perform well when tested on similar targets to those in the training set but fail to generalize to dissimilar targets. To test what a machine learning-based scoring function has learned, input attribution, a technique for learning which features are important to a model when making a prediction on a particular data point, can be applied. If a model successfully learns something beyond data set biases, attribution should give insight into the important binding interactions that are taking place. We built a machine learning-based scoring function that aimed to avoid the influence of bias via thorough train and test data set filtering and show that it achieves comparable performance on the Comparative Assessment of Scoring Functions, 2016 (CASF-2016) benchmark to other leading methods. We then use the CASF-2016 test set to perform attribution and find that the bonds identified as important by PointVS, unlike those extracted from other scoring functions, have a high correlation with those found by a distance-based interaction profiler. We then show that attribution can be used to extract important binding pharmacophores from a given protein target when supplied with a number of bound structures. We use this information to perform fragment elaboration and see improvements in docking scores compared to using structural information from a traditional, data-based approach. This not only provides definitive proof that the scoring function has learned to identify some important binding interactions but also constitutes the first deep learning-based method for extracting structural information from a target for molecule design.


Assuntos
Aprendizado de Máquina , Proteínas , Ligação Proteica , Ligantes , Proteínas/química , Bases de Dados de Proteínas , Simulação de Acoplamento Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...